arXiv: 1503.02495v 1 [nlin.CD] 9 Mar 2015 


An approach to comparing Kolmogorov-Sinai and 
permutation entropy 

Valentina A. Unakafova* 1,2 , Anton M. Unakafov 1,2 , and Karsten 

Keller 1 

institute of Mathematics, University of Liibeck 
2 Graduate School for Computing in Medicine and Life Sciences, 

University of Liibeck 


January 4, 2013 


Abstract 

In this paper we discuss the relationship between permutation entropy 
and Kolmogorov-Sinai entropy in the one-dimensional case. For this, 
we consider partitions of the state space of a dynamical system using 
ordinal patterns of order (d+n — 1) on the one hand, and using n-letter 
words of ordinal patterns of order d, on the other hand. The answer to 
the question of how different these partitions are provides an approach 
to comparing the entropies. 


1 Introduction 

In this paper we discuss the relationship between the permutation entropy, 
introduced by Bandt and Pornpe [Tj, and the well-known Kolmogorov-Sinai 
entropy (KS entropy). A significant result in this direction, given by Bandt, 
Keller, and Pornpe [2], is equality of both entropies for piecewise strictly 
monotone interval maps. For many dynamical systems KS entropy has been 
shown to be not larger than permutation entropy [2H2]- Amigo et al. have 
proved equality of KS entropy and permutation entropy for a slightly differ¬ 
ent concept of permutation entropy [BJ[7] (for a detailed discussion see [H] ). 

The representation of KS entropy on the basis of ordinal partitions given 
in BM allows to relate permutation entropy and KS entropy. Roughly 
speaking, ordinal partitions classify the points of the state space according 
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to the order types (ordinal patterns) of their orbits. The next step for better 
understanding the relationship of the entropies is to answer to the question 
of how much more information ordinal patterns of order (d + n — 1) provide 
than n overlapping ordinal patterns of order d [9]. Here we specialize the 
considerations in [9] to the case of one-dimensional dynamical system. At 
this level of research we do not have conclusive results, but we present some 
new ideas in this direction. 


1.1 Preliminaries 


Throughout the paper, (P,B (Q),p,T) is a measure-preserving dynamical 
system, where 11 is an interval in M, B(P) is the Borel sigma-algebra on it, 
p : B(P) —> [0,1] is a probability measure with /r({u>}) = 0 for all u £ P, 
and T : 11 is a B(P)-B(P)-measurable /r-preserving transformation, i.e. 
p(T~ 1 (B)) = p(B) for all B € B(P). 

The (Shannon) entropy of a finite partition V = {Pi, P 2 ,..., P/} C B(P) 
of P with respect to p is defined by 

PeV 


(with OlnO := 0). 

The alphabet A = {1,2,... ,1} corresponding to a finite partition V = 

Pi} provides words 0102 ... a n of given length n, and the set A n of all such 
words provides a partition V n of P into the sets 

P ai a 2 ...a n = {u € P ai ,T( u) € P a2 ,...,T° n -\u) e P a J. 

Here T ot denotes the f-th iterate of T. 

The Kolmogorov-Sinai entropy (KS entropy) and the permutation en¬ 
tropy of T are defined by 

, m r H(V n ) 

) = sup iim - 

V finite partition n ^°° Tl 


and 


Km - e 

y d^oo d 


respectively, where V(d) is the ordinal partition we will consider in Section 

El 


It was shown in [3H5] that for many cases ordinal partitions characterize 
the KS entropy of T in the following way: 

H(V{d) n ) 


hJT) = lim lim 

d —>00 ri —>00 


n 


( 1 ) 


(The partition V(d) n given V(d) fits into the general definition of V n given 
V as defined above.) 
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1.2 Relationship between KS and permutation entropy 

For the following discussion, recall the main result from [9]. 

Theorem 1. The following statements are equivalent for h^(T) satisfying 

(i) h,(T) = h;(T). 

(ii) For each s > 0 there exists some d £ £ N such that for all d> d e there 

is some with 

H(V(d + n — 1)) — H ( V(d) n ) < (n — l)e for all n > nd■ (2) 

The purpose of the following discussion is to compare the partitions 
V(d+n — 1) and V(d) n and to answer the question under what assumptions 
(ii) in Theorem [2] holds and, more generally, in what extent these partitions 
differ with increasing d and n. 

Let us define Vd C as 

V d = {w | u < T od (u), T°\w) <£ (c v,T od (u)) for all / = 1,..., d - 1} 

U {w | w > T od (u), T°\oj) £ [ T od {u),uj} for alU = 1,..., d - 1}. (3) 

The sets V^+i,..., V^+n-i, more closely considered in Section [2 allow to 
describe all elements of the partition V(d + n — 1), which are proper subsets 
of some elements of the partition V(d) n . We are interested in showing that 
the sets Vd are small in a certain sense. 

Recall that T is said to be mixing or strong-mixing if for every A,B£ 

M(Q) 

lim fi(T~ on A n B) = fi{A)n(B). 

n—>oo 

Theorem 2. If T is mixing, then for all e > 0 there exists some d £ such 
that for all d> d e 

F(Vd) < e. (4) 

Theorem [3] provides a tool for comparing “successive” partitions V(d + 
l)re-i and V(d) n . 

Theorem 3. For all n £ N \ {1} and d £ N it holds 

H(V(d + l) n _i) - H(V(d) n ) < In 2(n - (5) 

Putting together Theorem [2] and Theorem [3l one gets a more explicit 
variant of ©: 

Corollary 4. If T is mixing, then for all e > 0 there exists some d £ G N 
such that for all d > d £ ,n £ N \{1} it holds 

H(V(d + l) n _r) - H(V(d) n ) < (n - l)e. 
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Coming back to the partitions V(d + n — 1) and V(d) n , in Section[4]we 
obtain the following upper bound for H(V(d + n — 1)) — H(V(d) n ): 


n —1 

H(V(d + n - 1)) - H(V(d) n ) < In 2 - i)fi{V d+i ) (6) 

i =1 


(compare with ([2]) ). 

Being the main results of our paper, Theorem [2l Theorem [3l and Corol¬ 
lary [3] shed some new light on the general problem of equality between 
Kolmogorov-Sinai and permutation entropy in the one-dimensional case. 

Section [2] gives the detailed description of n-letter words with ordinal 
patterns of order d as letters, of ordinal patterns themselves and their con¬ 
nection to the sets Vd- In Section [3] we focus on the partitions V{d + l) n _i 
and V{d) n and prove Theorem [3l Section 21 is devoted to the relation of 
the partitions V(d + n — 1) and V(d) n and provides ([6]). Finally, we prove 
Theorem [2] in Section 0 

2 From ordinal patterns to words 

Let us recall the definition of ordinal patterns. 

Definition 1. Let 11^ be the set of permutations of the set {0,1, 2,..., d} 
for d € N. Then the real vector (xq, aq,..., Xd) € M d+1 has ordinal pattern 
7T = (r 0 ,ri,..., r d ) Gll rf of order d if 

x ro ^ ... ^ 


and 

ri -1 > ri in the case x ri l = x n . 

We divide now the state space into sets of points having similar dynamics 
from the ordinal viewpoint. 

Definition 2 . For d € N, the partition V(d) = { P ^ | it € Ll^} with 

= {w G f! | (T od (uj), T od_1 (w),... ,T(u),uj) has ordinal pattern 7r} 

is called ordinal partition of order d with respect to T. 

A finer partition is obtained by considering more than one successive 
ordinal pattern. 

Definition 3 . We say, that a real vector (xo, xi,..., Xd+n-i) € M. d+n has 
(n, d)-word ■.. Tt n if 


(xi, x^ |_i,..., Xi + d) has ordinal pattern 7Tj + i G 11^ for i = 0,1,... , n — 1. 
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The partition V(d) n associated to the collection of (n,d)- words consists 
of the sets 

Pnw... tt„ = {w € P n ,T(u) € P W2 ,...,T on_1 (w) € -P^},vri,7r 2 ,...,7r n 6 n d . 

Figure Q] illustrates a segment (uj, T(w),..., T o5 (ca)) of some orbit (a) 
and the corresponding (5,1)-, (4,2)-, (3,3)-, (2,4)- and (l,5)-words (b). 


(l,5)-word 


(2,4)-word 


/ • • • (3,3)-word 

y / MV 

\/. V A V / <4,2> ™' 

W Tjhi) T°\u) T oS (u)T oi {u)T oS (u) \ f \ (5,l)-word 



(a) (b) 

Figure 1: Representation of the segment of the orbit (a) by (n, d)-words (b) 

Upon moving from (1,5)- to (5,1)-words one loses some information 
about the ordering of the iterates of T. For example, the (3,3)-word deter¬ 
mines the relation 

cj < T o 3 (uj), 

but in the (4, 2)-word this relation is already lost. It either holds uj > T o 3 (uj) 
or oj < T o3 ( oj). 

On the other hand, one does not lose the relation 

u < T o4 (w) 

when moving from the (2,4)-word to the (3,3)-word, although uj and T o 4 (uj) 
are in different patterns of the (3, 3)-word. The reason for this is the exis¬ 
tence of the intermediate iterate T o 3 (uj) with 

oj < T o 3 (oj) < T o4 (cj). 

More generally, if there is some intermediate iterate T o 1 (oj) with oj < 
T o1 (oj ) < T od+l (oj ) or T od+l (oj ) < T° l (oo ) < oj, the relation between oj and 
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T od+1 (u ) is not lost upon moving from (1 ,d + 1)- to (2,d)-words, and is 
lost otherwise. Therefore, the set V^+i (see (|3]l) consists of all u, for which 
the relation between cj and T od+1 (u) is lost upon moving from (1 ,d + 1)- 
to ( 2 , d)-words. More precisely, the set V d +\ is a union of the sets of the 
partition V{d+ 1) that are proper subsets of some sets of the partition V(d) 2 - 
Figure [2] illustrates u,T o2 (u) G V 3 for our example. 



(a) (b) 

Figure 2: w G V 3 (a), T o2 {uj) G U 3 (b) 

In the following section we compare the partitions V(d) n and V(d+ l) n _i 
by means of the set V d +i- 

3 The partitions V{d+ l) n 1 and V(d) n 

Upon moving from (n — 1, d + 1)- to (n, d)-words, for i = 0,1,..., n — 2 the 
relation between T°*(w) and T od+t+1 (u) is lost iff T° 1 (lo) G V^+i- Therefore, 
if Vd+i 7 ^ 0, then the partition V(d+ l) n _i is properly finer than the partition 
V(d) n . The following is valid: 

Proposition 5. Given P € V(d) n , let k = #{l € {0,1,..., n — 2} | P C 
T _oZ (Vrf + 1 )}. Then there exist 2 k sets Pi, P 2 ,..., P 2 k G V(d + l) n -i with 

Pi U P-2 u ... U P 2 k = P. 

Proof. Consider some P G V{d) n and the corresponding (re, d)-word. Since 
the (n, d)-word determines the same dynamics for all uj G P, for l = 
0 , 1 ,..., n — 2 it holds either 


P C T~ ol {V d+ i) (7) 

or 

PnT~° l (V d+1 ) = 0. ( 8 ) 

For each l with (0 and all wGP, either T ol (u) < T od+l+1 (u) or T od+l+1 (u) < 
T ol (ui) providing a division of P into two subset. We are done since there 

are exactly k such divisions. □ 
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in r;> 


IM 


// I\ \\ 


Ku T\V) 


(a 


K 

(b) 


V P T \V) 

(c) 


Figure 3: From (3,1)- to (2,2)-words. V 2 UT _1 (V 2 ) in (a) stands for the 
complement of V 2 U T _1 (V 2 ) 


Figure [3] illustrates Proposition [5j For u £ V 2 UT~ 1 (V 2 ) the obtained 
(3, l)-word is not divided and contains the same information about the or¬ 
dering as 2° = 1 (2,2)-word (a), for u € V 2 the (3, l)-word is divided into 
2 1 = 2 (2, 2)-words (b) and for uj £ V 2 PI T _ 1 (V 2 ) the (3, l)-word is divided 
into 2 2 = 4 (2,2)-words (c). 

Let k(P) be determined as in Proposition [5] for each P £ V(d) n . Since 
for each P it holds either © or (0), it follows 

n— 2 n— 2 

J2^T-° j (V d + i)) = E E v(T-°i(V d+1 )nP)= J2 k (P)v(P)- 

3 =0 3=0 PeP(d)n PeV(d) n 

(9) 

Therefore, by Proposition0and © one obtains an upper bound for H(V(d+ 
l) n _i) — H(V(d) n ) in the following way: 

+ Y. (^( p ) ln M( p ) - 2k{P) ln 

Pev(d) n ' y 

n— 2 

= In 2 Y k ( P )v(P) = hi 2 Y^T~° j (Vd + i)) = In 2(n - l)»{V d+1 ). 

P&P(d) n j =0 

( 10 ) 


Inequality (flOl) provides the proof of Theorem [3j 

4 The partitions V{d) n and V[d + n — 1) 

Here we move from (l,d + n — l)-words (i.e. ordinal patterns of order 
(d + n — 1)) to (n, d)-words. At this point we cannot definitely say into how 
many (n, d)-words a (1 ,d + n — l)-word is divided in dependence on the sets 
td+l 1 •• ■ >Ld+n—1 • 
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Let us give an example. Figure [4] illustrates a (3, l)-word with the same 
information as in the (l,3)-word (a), other two (3, l)-words are divided into 
three and five (l,3)-words ((b) and (c), respectively). 


::: r:. 


ra 


// V 


/ H 4 


V^T\V 2 ) UK vnv 3 


(b) 


K Vp T\Vf) VpT\V 2 )nV 3 Vp T\ V) 

( c ) 


Figure 4: From (3,1)- to (l,3)-words. V 2 U T -1 ^) U V 3 in (a) stands for 
the complement of V 2 U T -1 ^) U V 3 

One obtains an upper bound for H(V(d+n—l))—H(V(d) n ) by successive 
application of (fill : 

TO —1 

H(V(d + n - 1)) - H(V(d) n ) = + n ~ *)i) - H (V( d + n-i- l) i+ i)) 

2=1 

TO — 1 TO — 1 

< In 2 fj,(V d+n -i) = In 2 ^(n - i) fj,(V d +i). 

1=1 2=1 

( 11 ) 

Comparing ([2|) and dill) it is natural to ask how fast the measure of the 
set V f i decreases with increasing d. This question is the subject of current 
research. 


5 Proof of Theorem [2] 

In the following, we assume that T is strong-mixing, however, some parts of 
the proof need only the weaker assumption of ergodicity, as we will indicate. 

Lemma 6 . Let T be ergodic. Given an interval A C and d € N \ {1}, let 
Vd = Vd(A) be the set of points co € A for which at least one of two following 
conditions holds: 

T o1 {uj) {a £ A | a < u} for all l = 1,..., d — 1, 
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( 12 ) 












T o1 (uj) £ {a G A | a > oj} for all l = 1, d — 1. (13) 

Then for all e > 0 there exists some d £ G N such that n(Vd) < e for all 
d> d e . 

Proof. Let VJf be a set of points uj satisfying C2D- Then it is sufficient to 
show l-ifV^) < | for the corresponding d since for points satisfying (fl3|) the 
proof is completely resembling. 

Consider a partition of A into intervals Bi with the following 

properties: 

(i) fJ.(Bi) = Ltd! for all i G N, 

(ii) for all i < j, and for all ui\ G L>i,cj 2 £ Bj it holds ui\ > U 2 - 

Since //({a;}) = 0 for all uGfi, such partition always exists. 

Define D i4 = {w G B % \ T ol (u) (f (J f =i Bj for all l = 1 ,...,d- 1}. It 
holds 

d— 1 / oo 

nr -°'(\J 

1=1 \ j=i 

For all d G N, (fTTl) provides Vjf C (J“ 1 Di 4 and, since Di 4 C B. t) it holds 

OO OO k OO 

ML 1 ) < mU By) = E 

i= 1 i= 1 i= 1 i=k+ 1 

< £ MAy) + E ^ < E (.(By) + ^ (15) 

i=l i=k+ 1 i=l 

for all A; G N. On the other hand, by the ergodicity of T (compare [TO]) and 
by (|14l) we have 


U Ay 



OO oo 

Mp| D i,d) hdj Bj) 

d=l j=i 


lim — 

m—>oo 777 , 


m— 1 


Eb 


f| AynT-"(QB,-) 


,d= 1 




= 0 . 


Therefore, /b((J^L ? : Bj) > 0 implies ^(P|))Li -Di,<z) = 0 and, since O D 
..., it holds 

OO 

lim fi{D i d ) = M(n D i d ) = 0 

d=>oo ' 1 

d= 1 

for all i G N. 

Now let e > 0. Fix some k G N with k > log 2 7 and d e with /a(D i4 ) < ^ 
for alii = 1, 2 ,..., k and d > d e . Then, owing to ifT5l) . for d > d e it holds 




£ 

2 


completing the proof. 


□ 


□ 
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Now we are coming to the proof of Theorem [2j Given e > 0, let r > 
| and let {Ai}^ =1 be a partition of into intervals Ai with n(A{) = ~. 
Furthermore, fix some d e € N with 

n T~° d (A t )) < // 2 (^) + f = 4 + f (16) 

dr r z Sr 

and 

M^) < f (17) 

6r 

for all i = 1,2 ,r and all d > d e , which is possible by the strong-mixing 
of T and by Lemma O respectively. 

For uj £ V d fl Aj t it is impossible that both T od (uj ) 0 Tlj and u V d {Ai), 
implying 


v d = \J(v d nA i ) c (J((Anr-° d (A))uv d (4)) 

2=1 2=1 

r r 

= |J (Ai n T~ od {Ai)) U U V d (Ai). 

2=1 2=1 

From this, (usd, and (1171) . one obtains 

r r 

»(V d ) < Y,^A l nT- od {A l )) + Y J H{VM i )) 

2=1 2=1 

" r (^ + ^) + ^ <£ ' 

Remark. The technical assumption that //({w}) = 0 for all lo £ fl is rather 
weak. In the ergodic (resp. strong-mixing) case, //({w}) > 0 would imply 
that w is a periodic (resp. fixed) point and that n is concentrated on the 
orbit of u (resp. on oj). 

This work was supported by the Graduate School for Computing in 
Medicine and Life Sciences funded by Germany’s Excellence Initiative [DFG 
GSC 235/1]. 
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